Instance-based Sentence Boundary Determination by Optimization for Natural Language Generation

نویسندگان

  • Shimei Pan
  • James Shaw
چکیده

This paper describes a novel instancebased sentence boundary determination method for natural language generation that optimizes a set of criteria based on examples in a corpus. Compared to existing sentence boundary determination approaches, our work offers three significant contributions. First, our approach provides a general domain independent framework that effectively addresses sentence boundary determination by balancing a comprehensive set of sentence complexity and quality related constraints. Second, our approach can simulate the characteristics and the style of naturally occurring sentences in an application domain since our solutions are optimized based on their similarities to examples in a corpus. Third, our approach can adapt easily to suit a natural language generation system’s capability by balancing the strengths and weaknesses of its subcomponents (e.g. its aggregation and referring expression generation capability). Our final evaluation shows that the proposed method results in significantly better sentence generation outcomes than a widely adopted approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Natural Sentences by Using Shallow Discourse Information

One of the biggest defects of natural language generation systems is that the output sentences are unnatural and contain many redundancies. Machine translation (MT) users, for instance, often get tired of reading the output of MT because of this problem. In this paper, we summarize the results of our analysis of human translation in terms of the use of discourse information to generate target-l...

متن کامل

Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking

In spoken language, sentence boundaries are much less explicit than in written language. Since conventional natural language processing (NLP) techniques are generally designed assuming the sentence boundaries are already given, it is crucial to detect the boundaries accurately for applying such NLP techniques to spoken language. Classification frameworks, such as Support Vector Machines (SVMs) ...

متن کامل

Building applied natural language generation systems

In this article, we give an overview of Natural Language Generation (nlg) from an applied system-building perspective. The article includes a discussion of when nlg techniques should be used; suggestions for carrying out requirements analyses; and a description of the basic nlg tasks of content determination, discourse planning, sentence aggregation, lexicalization, referring expression generat...

متن کامل

Generate Compressed Sentences with Stanford Typed Dependencies towards Abstractive Summarization

In this paper, we implement sentence generation process towards generate abstractive summarization which is proposed by (Genest and Lapalme, 2010). We simply use Stanford Typed Dependencies1 to extract information items and generate multiple compressed sentences via Natural Language Generation engine. Then we follow LexRank based sentence ranking combined with greedy sentence selection to build...

متن کامل

Optimizing question answering systems by Accelerated Particle Swarm Optimization (APSO)

One of the most important research areas in natural language processing is Question Answering Systems (QASs). Existing search engines, with Google at the top, have many remarkable capabilities. But there is a basic limitation (search engines do not have deduction capability), a capability which a QAS is expected to have. In this perspective, a search engine may be viewed as a semi-mechanized QA...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005